Predicting Urban Growth 2020-2030#

Having constructed the model to predict urban growth using several predictors mentioned in previous sections, we can now borrow the past’s experience and extrapolate it into the future. We can use the model from the previous section, and predict urban growth during 2020-2030, using data from 2020.

A key issue here is development allocation. Granted that the model can output scores for each cell, how do we determine which cells are more likely to be actually developed than others? We use two approaches here:

  • Allocation according to population projection. This approach assumes that the the newly developed acreage is proportional to population growth, and that population projections are accurate. Population projection is obtained from the State of Colorado’s data portal.

  • Allocation by threshold. We can use the threshold proven to be effective in the previous section, 0.75, as the cutoff point. The caveat here is that, because the Denver MSA’s sprawl is likely to be continuously contained during 2020-2030, this method will likely over-predict urban growth. Nevertheless, the overly-predicted cells are likely to be proximate to the actual urban growth, and thus can still be a good indicator.

In the results, we will label the developments allocated using the first method “Very Likely”, and that using the second method “Likely”.

import numpy as np
import pandas as pd
import geopandas as gpd
import altair as alt

from imblearn.ensemble import BalancedRandomForestClassifier
from sklearn.model_selection import train_test_split

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import FunctionTransformer, StandardScaler, OneHotEncoder
from sklearn.pipeline import Pipeline, make_pipeline

import matplotlib.pyplot as plt
from sklearn import metrics

from joblib import load

crs = "EPSG:2232"

# Load data where we left off the last time
fishnet = gpd.read_file("../data/fishnets/with_counties.geojson")
fishnet.crs = crs

comprehensive_model = load("../data/model.joblib")

# Predictors for business-as-usual
future_vars = {
    "population_2020": "population",
    "land_cover_type_2019": "land_cover_type",
    "highway_distance": "highway_distance",
    "rail_station_distance": "rail_station_distance",
    "facility_distance": "facility_distance",
    "lag_developed_2019": "lag_developed",
    "lag_population_2020": "lag_population",
    "county": "county",
    "developed_2019": "init_developed",
}

# Predictors for the light-rail scenario
alternative_vars = {
    "population_2020": "population",
    "land_cover_type_2019": "land_cover_type",
    "highway_distance": "highway_distance",
    "rail_station_future_distance": "rail_station_distance",
    "facility_future_distance": "facility_distance",
    "lag_developed_2019": "lag_developed",
    "lag_population_2020": "lag_population",
    "county": "county",
    "developed_2019": "init_developed",
}

future = (
    fishnet[list(future_vars.keys()) + ["geometry"]]
    .query("developed_2019 == False")
    .copy()
    .rename(columns=future_vars)
)

alternative_future = (
    fishnet[list(alternative_vars.keys()) + ["geometry"]]
    .query("developed_2019 == False")
    .copy()
    .rename(columns=alternative_vars)
)

developed_in_20 = (
    fishnet[list(future_vars.keys()) + ["geometry"]]
    .query("developed_2019 == True")
    .copy()
    .rename(columns=future_vars)
)

future["developed_proba"] = comprehensive_model.predict_proba(future)[:, 1]
alternative_future["developed_proba"] = comprehensive_model.predict_proba(
    alternative_future
)[:, 1]

The below calculations show that during 2010 to 2020, 68 cells, or 24977 acres were developed. According to the population projection, by the same rate, 48 cells or 17933 acres will be developed by 2030.

Hide code cell source
county_names = fishnet.county.unique().tolist()
projections = pd.read_csv("../data/population/projection.csv")
projections_2030 = (
    projections.query("(year == 2030) & (county.isin(@county_names))")
    .groupby("county")["totalPopulation"]
    .sum()
)

population_2030 = projections_2030.sum()
population_2010 = fishnet.population_2010.sum()
population_2020 = fishnet.population_2020.sum()

pop_change_10_20 = population_2020 - population_2010
new_dev_10_20 = fishnet.query("developed_2009 == False").copy().developed_2019.sum()
dev_dens = pop_change_10_20 / new_dev_10_20

pop_change_20_30 = population_2030 - population_2020
new_dev_20_30 = pop_change_20_30 / dev_dens

print(
    f"During 2010 to 2020, {int(new_dev_10_20)} cells, or {round(new_dev_10_20 * 4000 * 4000 / 43560)} acres were developed."
)
print(
    f"According to the population projection, by the same rate, {int(new_dev_20_30)} cells\nor {round(new_dev_20_30 * 4000 * 4000 / 43560)} acres will be developed by 2030."
)
During 2010 to 2020, 68 cells, or 24977 acres were developed.
According to the population projection, by the same rate, 48 cells
or 17933 acres will be developed by 2030.
Hide code cell content
threshold = 0.75

future["likely"] = "Not Likely"
future.loc[future["developed_proba"] > threshold, "likely"] = "Likely"
most_likely = future.nlargest(int(new_dev_20_30), "developed_proba")
future.loc[most_likely.index, "likely"] = "Very Likely"

alternative_future["likely"] = "Not Likely"
alternative_future.loc[
    alternative_future["developed_proba"] > threshold, "likely"
] = "Likely"
most_likely = alternative_future.nlargest(int(new_dev_20_30), "developed_proba")
alternative_future.loc[most_likely.index, "likely"] = "Very Likely"

# Join prediction with cells already developed in 2019
developed_in_20["likely"] = "Already Developed"
developed_in_20["developed_proba"] = 1

future_comprehensive = pd.concat([
    future,
    developed_in_20
])

alternative_future_comprehensive = pd.concat([
    alternative_future,
    developed_in_20
])

The below maps shows the urban growth predictions for the business-as-usual scenario.

Hide code cell source
from assets.colors import (
    palette_hero,
    palette_green,
    palette_primary
)

def altair_fishnet(fishnet, column, color_dict, legend_title, title):
    chart = alt.Chart(
        fishnet.to_crs(4326)[["geometry", column]]
    ).mark_geoshape().encode(
        color=alt.Color(
            f"{column}:N",
            title=legend_title,
            scale=alt.Scale(
                domain=list(color_dict.keys()), range=list(color_dict.values())
            ),
        )
    ).properties(
        width=400, height=330, title=title
    )
    return chart

color_dict = {
    "Not Likely": "#eeeeee",
    "Likely": palette_green,
    "Very Likely": palette_primary,
    "Already Developed": "#dddddd"
}

alt.data_transformers.disable_max_rows()

altair_fishnet(
    future_comprehensive,
    "likely",
    color_dict,
    "Likelihood of Development",
    "Urban Growth Prediction 2020-2030, Business-as-Usual Scenario"
)

According to the model, the most likely urban developments will happen rather sporadically on the fringes of the current urban area and along key corridors in the southeast. The “Very Likely” cells are closer to existing urban areas, compared to the “Likely” cells.

What about in the light-rail scenario?

Hide code cell source
prediction = altair_fishnet(
    alternative_future_comprehensive,
    "likely",
    color_dict,
    "Likelihood of Development",
    "Urban Growth Prediction 2020-2030, Light-Rail Scenario"
)

rail_stations_future_url = "../data/other/rail_stations_new.shp"
rail_stations_future = gpd.read_file(rail_stations_future_url).to_crs(crs)

def points_to_altair(gdf):
    gdf = gdf.to_crs(4326)
    gdf["x"] = gdf.geometry.x
    gdf["y"] = gdf.geometry.y
    return (
        alt.Chart(gdf)
        .mark_circle(size=3)
        .encode(
            longitude="x",
            latitude="y",
            color=alt.value("white"),
            stroke=alt.value("white"),
        )
    )


overlay = points_to_altair(rail_stations_future)

prediction + overlay

The proposed light rail will indeed make it more likely for cells proximate to the stations to be developed.

However, it should be noted that most cells along the new stations are marked as green in the above map, which is only “Likely” rather than “Very Likely”. According to the Denver MSA’s growth rate, only less than 1 in 10 cells marked “likely” by the 0.75 threshold will actually get developed.

On the other hand, the cells marked “Very Likely” are still concentrated in locations similar to the business-as-usual scenario, meaning that for the Denver MSA, factors other than light rail play a bigger role in determining urban growth. The exception is Strasburg, where there will be a new light rail station plus existing urban development, which renders two cells marked “Very Likely” for future development.

Overall, the new light rail will likely not cause significant sprawl along it, compared to the business-as-usual scenario. It will, however, promote some urban growth around new stations in existing towns, expecially Strasburg, offering opportunities for transit-oriented development.